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ADAPTIVE DENSITY ESTIMATION UNDER WEAK DEPENDENCE 

GANNAZ IRENE (1) AND WINTENBERGER OLIVIER (2) 
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' Abstract. Assume that (Xt)t& is a real valued time series admitting a common marginal den- 

, sity / with respect to Lebesgue's measure. Donoho et al. (1996) propose near-minimax estimators 

/„ based on thresholding wavelets to estimate / on a compact set in an independent and identi- 
cally distributed setting. The aim of the present work is to extend these results to general weak 
dependent contexts. Weak dependence assumptions are expressed as decreasing bounds of covari- 
ance terms and are detailed for different examples. The threshold levels in estimators /„ depend 
f-H , on weak dependence properties of the sequence (Xt)tgz through the constant. If these properties 

• are unknown, we propose cross-validation procedures to get new estimators. These procedures 

_^ ' are illustrated via simulations of dynamical systems and non causal infinite moving averages. We 

^3 . also discuss the efficiency of our estimators with respect to the decrease of covariances bounds. 



m 

> I 1. Introduction 

Let (XfjteZ be a real valued time series admitting a common marginal density / that is com- 
pactly supported. The general purpose of this paper is to estimate / by wavelet estimators fn 
constructed from n observations (Xi, . . . ,X n ). In their seminal paper Donoho et al. (1996) ((H) 
showed that projection-like linear estimators are not optimal: introduction of nonlinearity via 
thresholds of wavelet coefficients is investigated. Wavelets thresholding provides estimators which 
adapt themselves to the unknown smoothness of /, we refer to Vannucci (1998) (26) for a survey 
of the use of wavelet bases in density estimation. The present work extends near minimax results 
of soft and hard-threshold estimators from the independent and identically distributed (iid for 
short) framework, see Theorem 5 in Donoho et al. (1996) (8), to cases where weak dependence 
between variables occurs. 



X 



Our main assumptions give bounds for covariance terms as decreasing sequences which tend 
to zero when the gap between the past and the future of the time series goes to infinity. In 
order to give examples satisfying these conditions, we introduce coefficients that give bounds of 
covariance terms and that are computable for a large class of models. Weak dependent coefficients, 
introduced by Doukhan and Louhichi (1999) |9j), as mixing ones, are well-adapted to that purpose. 
Using /3-mixing coefficients Tribouley and Viennet (1998) |i3 ) proposed minimax estimators with 
respect to the Mean Integrated Square Error (MISE for short). Comte and Merlevede (2002) (0) 
obtained near-minimax results using a-mixing coefficients. The loss of a logarithmic factor in the 
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convergence rate in this last paper is balanced by the generality of the context, as the class of 
a-mixing models is larger than the one of /3-mixing models. 

However, a and /3-mixing coefficients are not easy to compute for some models and are useless 
for others; Andrews (1984) proved that the mixing coefficients of the stationary solution of 
the AR(1) model 

Xt = — (Xt-i + £t) , where (£t)tez iid with a Bernouilli law of parameter 1/2, (1-1) 

do not tend to zero as the gap from the past to the future of the time series goes to infinity. 
Mixing coefficients do not behave nicely in this case as, through a reversion of time, the Markov 
chain solution of (|1.1|) is a dynamical system, i.e. Xt-i = T{Xt) for some transformation T, 
namely T(x) = 2x1 0<x<1 / 2 + (2x — l)li/2<x<i- So called weak dependence coefficients have been 
recently developed to deal with such processes, see Dedecker et al. (d), Maume-Deschamps (j2ol ) 
and references therein. Introduced by Dedecker and Prieur (2005) in (0), i^-weak dependence co- 
efficients give sharp bounds on the covariance terms of dynamical systems, such as the stationary 
solution to (|1.1|) . Using these coefficients, we prove near-minimax results of thresholded wavelet 
estimators for dynamical systems called expanding maps. To our knowledge, only non adaptive 
density estimation has been studied in this non- mixing context, see for instance Bosq et Guegan 
(j2), Prieur H3) and Maume-Deschamps (j2ol ). 



The advantage of our approach is also to treat in one draw many other contexts of dependence. 
We prove that near-minimaxity still holds for a very large class of models using A-weak depen- 
dence coefficients, defined by Doukhan and Wintenberger, 2007, (flil ). We pay for generality by 
adding up conditions on the joint densities of the couples (Xq, X r ) for all r > 0. These conditions 
are not restrictive as it is satisfied for many econometric models such as ARMA, GARCH, ARCH, 
LARCH, MA models. 



The estimation scheme is based on Donoho et aVs procedures developed in (J8|) for the iid case, 
and it is adaptive with respect to the regularity of /. Soft and hard-threshold levels (Xj)j <j<j 1 
are chosen equal to KyJ j jn for some K > 0. Note that the constant K and the highest resolu- 
tion level j\ depend on the weak dependence properties of the observations. If weak dependence 
properties are known, estimators are near minimax: same rates as in the iid setting are achieved 
for mean-L p errors with 1 < p < oo. If weak dependence properties are unknown, we develop 
cross-validation procedures to approximate threshold levels Xj and the highest resolution level j\ . 
We check on simulations that corresponding estimators are adaptive with respect to the regularity 
of / and to weak dependence properties of (Xt)t£i- The order of errors of approximations in the 
dependence cases are very close to the one in the iid cases. 



We believe that we obtain such good results as we work on simulations of processes satisfying 
our main Assumption (D). This assumption consists of the exponential decay of the covariance 
terms. We give in this paper some simulations study of dynamical systems that do not satisfy this 
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Assumption (D). Comparing the behavior of our estimators with the kernel ones show that ours 
are less efficient when covariance terms do not decrease exponentially fast. We also prove that the 
error terms of our estimators are unbounded in some cases, due to terms of covariances that de- 
crease too slowly. This is a restriction of our procedure, based on thresholding wavelet coefficients. 



The paper is structured as follows. In Section^ we give notation, we introduce estimation pro- 
cedures and we formulate weak dependence assumptions. Main results are given in Section [3] and 
examples of models in Section HI Cross-validation procedures and their accuracy on simulations 
are developed in Section Proofs are relegated in the last Section. 



2. Preliminaries 

2.1. Notation. We restrict ourselves to the estimation of a density / which is compactly sup- 
ported. We suppose that / is supported by [— B,B] for some B > 0. For all p > 1, L p denotes 
the space of all functions / such that ||/||p = / \ f(x)\ p dx < oo. 

2.2. Density estimators. Throughout the paper, we work within an r-regular orthonormal 
multiresolution analysis of 1? (endowed with the usual inner product), associated with a com- 
pactly supported scaling function 4> and a compactly supported mother wavelet Without loss 
of generality, we suppose that the support of functions <p and ifi is included in an interval [—A, A] 
for some A > 0. Let us recall that <j) and ip generate orthonormal basis by dilatations and trans- 
lations: for a given primary resolution level jo, the functions {4>j± ■ x i—* 2- ? ' 2 ^>(2- J x — k)}kez and 
{V'j.fe : x l— ► 2^ 2 \{j(2 : > x — fc)}fc g g are such that the family 

{4>j ,k, k G Z, Vj,fc j > jo, k G 1} 
is an orthonormal base of I? . Any function / G L 2 can thus be decomposed as 

oo 

fcez j=j fcez 

where a jjk = f f(x)4> j:k (x)dx, j3 jyk = f f(x)ip j)k (x)dx. 

The nonlinear estimator developed in (0) is defined by the equation 

h 

/» = £ 

fcez j=jo kez 

where &j t k = n~ Ya=1 ^fcPQ) aim $j,k = nl J2i=i ^j,k(Xi), and where 7a is a threshold 
function of level A. The authors consider both hard and soft thresholding functions, corresponding 
respectively to 7a(/3) = 0^-\p\>x and 7a(/3) = — A)+sign(/3). If no distinction is done in the 
sequel, both hard and soft thresholding estimators are concerned. 
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Let s,n,r be three positive real numbers satisfying s + 1/2 — 1/ir > 0. We assume that / 
belongs to the Besov ball B^ r {M\) on the real line, i.e. ||/|| S)7r ,r < M\ where 

/ / \ r/n\ l / r 

,,, = Koi+ E( 2i(S7r+7r/2-1) Ei^i 



Approximation errors of an estimator /„ are expressed as E||/ n — /||p for p > 1. Associated 
minimax rates are the best convergence decrease a of the worst approximation error we may 
achieve over all estimators f n : 

inf sup E||/ n - f\\l = O (n^ Q ) . 
fn feB^ r (C) 

The minimax rate a is determined in as: 

(a+ = (s/(l + 2s) if e > 0, , , . . . , 

a={ ' , , , ~ where e = sir- (p-n)/2. 2.1) 

[a- = (s-ln + lp ) l + 2s-2/vr if e < 0, w )i \ J 



2.3. Weakly dependent assumption. Throughout the paper, the symbol 5 denotes with no 
distinction (j) or tp and 8j y k(x) = 6j t k(x) — Ef^/^Xo) for all integers j > and k. Define for all 
positive integers u, v the quantities 

Cg(r) = sup (|Cov(^(X sl ) • • •4 fc (X s J,4 A (X Su+1 ) • ■■5 h k(X Su+v ))\}. (2.2) 

maxs i+ i-Sj=s„ + i-s u =r 

Functions cp and ^ play a symmetric role through 5 in this setting. As stressed in ([9|), bounds on 
covariance terms Ci\ v (r) are useful to extend asymptotic results from the iid case. Now we can 
state the main assumption of this paper: 

(D): There exists a sequence p{r) such that for all r > 0, all indexes j, k,u,v, we have 
C J u k v( r ) < (u+v+uv)/2(2 j/2 M 2 ) u+v ~ 2 p(r) where M 2 is a constant satisfying \\5\loo < M 2 . (Dl) 

Moreover, there exist real numbers a, b, Cq > depending only on 5, f and on the depen- 
dence properties of (X t )tez such that 

p{r) < C exp(-ar b ) for all r > 0. (D2) 

In Section [U we give explicit conditions on the stationary process (Xt)t^z in order that it satisfies 
Assumption (D). When it is possible, the values of the constants a, b, Cq and M 2 are given. 



3. Main results 



Let (X t )t£Z be a stationary real valued time series. 
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3.1. A useful lemma. Under similar conditions than Assumption (D), moments inequalities of 
even orders and Bernstein's type inequalities for the sums Y27=i ^jkiP^i) are respectively given in 
d) and (0). The following Lemma recall these inequalities applied on the quantities /9j jt- They 
remain valid for ctj t k as 4> and tp play the same role in (D). 

Lemma 3.1. If (D) holds then for all even integer q > 2, for all A > and for j, k such that 
0<j< logra and < k < 2 j - I: 

P (\hk ~ Pi A > A) < 2exp - C3(2J/n)b/(2(1 ^ ))( ^ A)(2+3fe)/(1+2b) ) , (3.2) 

where C\, C 2 and C3 are constants depending on q and on the constants of Assumption (D): a, 
h, Co and M 2 . 



The proof of this Lemma is given in Section 16.11 Notice that C3 depends deeply on the 
dependence context through Co, see Propositions 14.11 and 14.21 for more details. 



3.2. Near-minimax results of thresholded wavelet estimators. Following results are ex- 
tensions to weak dependence settings of Theorem 5 of (|8|). 

Theorem 3.1. Suppose that f £ r {M\) with 1/ir < s < N/2 where N is the regularity of the 
function if). //(D) holds, then for each 1 < p < 00 there exists a constant C(N,p, a, b, Co, Mi, M 2 , A, B) 
such that 

' log n N p " 



n\\fn-f\\i]<c 



n 
log n 



n 



pa 



(logn)^ 2_ ( 1A7r ^ r ^ 



ife^O 

if 6 = 



where the minimax rate a and the parameter e are given in \2.1\l . Here jo is chosen as the smallest 
integer larger than log(n)(l + iV) -1 , j\ is the largest integer smaller than log(relog~ 2 ^~ 3 (n)) and 



Xj = K\J j/n for a sufficiently large constant K > 0. 



A sketch of the proof of this Theorem is given in Section 16.21 

We refer to (J21I ) for the definition of the parameter N, the regularity of the wavelet function 
ip. The condition s < N/2 ensures the sparsity of the wavelet coefficients of the density /. This 
condition is not restrictive as N can be chosen sufficiently large in practice. 

The estimators f n are the same than in the iid case given in (0) , except for the highest resolution 
level ji which is smaller here. This restriction is needed in the weak dependence context due to 
the Bernstein's type inequality (|3.2p which is not as sharp as the one in the iid case. But this 
restriction does not perturb the rate of convergence which is the same as the one obtained in the 
iid case by |1). 
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The constant K > plays a key role in the asymptotic behavior of f n . From (|3.2p . we infer that 
the constant K depends on the parameters a,b,Co, M2 through Ci, 62,63 in an intricate way. 
Then how sufficiently large the constant K must be deeply depends on the dependence structure 
of observations (X\, . . . ,X n ). Then contrarily to the iid case, we are not able to develop direct 
pro cedures based on the observations (X\, . . . ,X n ) that chose a convenient parameter K like in 
(Jill). In Section El we propose cross validation procedures to determine the threshold levels A, 
when the weak dependence properties of the process (X t )t<=z are unknown. 



4. Examples 



In this Section, we give examples of models that satisfy Assumption (D) using <p and A-weak 
dependence coefficients introduced respectively in (0) and (fl4h . For each of them, we proceed in 
two steps: Firstly we give sufficient conditions on these coefficients for ensuring Assumption (D) 
in Propositions 14.11 and 14.21 and secondly we give in Subsections 14.21 and 14.41 examples of models 
satisfying such conditions. 



A Lipschitz function h : R" — > R for some u G N* is a function such that Lip (h) < 00 with 

\h(a 1 ,...,a u )-h(bi,...,b u )\ 
Lip (ft) = sup — — — — -—. 

(ai,...,0^(&i,...,6„) |«i - oil H H |o« - 0u\ 

As Lipschitz functions play an important role in weak dependence contexts we restrict ourselves 
to the cases where N > 4. This assumption on the regularity of the wavelet functions implies that 
4> and i/j can be chosen as Lipschitz functions, as established in Q). Note also that ||/||oo < 00 
as / £ B* r (Mi), see equation (15) in (|8|). For convenience, we denote with no distinction 

ipj,k(x) — Pj,k an d 4>j,k{x) — Q-j^ as 5j t k(x) for any integers j > and k. 



Weak dependence coefficients is to generalize mixing ones. Let us recall that a-mixing coeffi- 
cients can be defined in a similar way by two equations: 

a(r) = sup E\E(g(X h ,...,X i J\a({X j ,j<i}))-E(g(X h ,...,X iu ))\, 

1 < t 

i + r < ji < ■ ■ ■ < j e 

llslU < 1 

a ( r ) = sup \Cov(f(X h ,...,X lu ),g(X iu+1 ,...,X iu+v ))\. 

(u, v) £ N* x N* 

ii < ■ ■ ■ < i u < i u + r < i u +i < ■ ■ ■ < i u + v 

ll/ll=o, Halloo < 1 

These coefficients measure the dependence between the past and the future values of the process 
(Xt)tez as the gap r between past and future goes to infinity. As these coefficients are often too 
restrictive, the authors of (0) release them by considering a supremum taken on functions with 
bounded variations or on Lipschitz bounded functions rather than uniformly bounded functions. 
These different choices of functions sets lead to different coefficients of weak dependence, namely 
(p and A respectively. We give hereafter the precise definition of <p an d A-weak dependence 
coefficients. 
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4.1. 0-weak dependence. Bounded variations functions are defined as follows: let BV and 
BV\ denote the sets of functions g supported on [—A, A] satisfying respectively ||<7||bv < +00 
and ||<?||w < 1 where 

n 

\\g\\ bv = lff(--4)l + sup sup ^2 - 9{ai-i)\- 

n€N ao=— A<ai< - <a„=A ■ j 

Let (Q,*4,P) be a probability space, M. a cr-algebra of A and let X = (Xi, . . . ,X V ), for v > 1, 
be a collection of real valued random variables Xj defined on A. We define the coefficient 4> as it 
was introduced in (0) by the equation: 



cf>(M,X 1 ,...,X v ) 



sup 

gi,...,g v eBVi 



Yl9i(xi)¥ x{M (dx) - / Y[gi(xi)Vx(dx) 



i=i 



8=1 



where = (dxi, . . . , The coefficients (f)(r) are now defined by the equation 

4>{r) = max - sup 4>{v{{Xj ; j < i}) , X h , . . . , X je ) . 

l<i I i+r<h< -<u 

These coefficients are multivariate extensions of 4>i(r) defined in Q), see also (j2ol ). Instead of 
mixing coefficients, they efficiently treat the dependence structure of dynamical systems and 
associated Markov chains. A process (X t ) te % is said to be c/>-weakly dependent if the series (f>(r) 
goes to when r goes to infinity, i.e. when the gap between observations from the future and 
observations from the past goes to infinity. Introducing ^-weakly dependent processes is useful 
in the present framework due to the following links with the Assumption (D): 

Proposition 4.1. Assume that (Xt)tez is a process such that there exist a,b, c > satisfying 

<j>{r) < c°exp(-ar b ), (4.1) 

then Assumption (D) holds with M 2 = 2^8^ + ALip 5) and C = 4c(||5||oo + ALip<T)||./'|| 00 ||<5||i. 



The proof of this Proposition is given in Subsection 16.31 



4.2. Examples of 0-weakly dependent processes. Following the work of (j7|), we give a 
general class of models where (|4.ip is satisfied. 

Lemma 4.1. Assume that (Xt)tei, is a, process satisfying the Markov property, taking values in 
[0, 1] and such that there exist constants a,b,c> satisfying, for any functions g, k with g £ BV\ 
and E|fe(Xo)| < 00, and for all r > 0, 

\Cov(k(X ),g(X r ))\ < E|fc(X )|exp(ar~ b ), (4.2) 

||E( 5 (X r |X = OIIbv < c, (4.3) 

then <j) v (r) < c v exp(— ar fe ) and the conclusions of Proposition \J7T\ follow. 

The proof of this Lemma is given in Subsection 16.31 Various examples of processes satisfying 
conditions of Lemma 14.11 are given in (0) ■ We recall here the case of Markov chains obtained by 
time reversing expanding maps, as they are extensions of Andrew's example (jl.ip . 
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Let us define stationary Markov chains (Xtjtez associated with dynamical systems through a 
reversion of the time as non degenerate stationary solutions of the recurrent equation 

X t = T i (X t -i), Vt 6 Z, i G N (4.4) 

where T : [0, 1] — * [0, 1] is a deterministic function. 

Remark 1. For such Markov chains, mixing coefficients are useless. Future values write simply 
as functions of past values via (14. 4p and then it is easy to check that a(r) are constant from their 
definitions. Thus a(r) does not tend to and a-mixing coefficients do not evaluate the dependence 
of such processes. 



A Markov chain (Xt)t£Z is associated with an expanding map through a reversion of the time 
if T satisfies 



• (Regularity) The function T is differentiable, with a continuous derivate T' and there 
exists a grid = ao < a\ ■ ■ ■ < a k = 1 such that > on ]a,_i,aj[ for each 
i = 1, . . . , k. 

• (Expansivity) For any integer i, let Ii be the set on which the first derivate of T l , (T 4 )', 
is defined. There exists a > and s > 1 such that inf^g^d (T l )'(x)\} > as 1 . 

• (Topological mixing) For any nonempty open sets U, V, there exists iq > 1 such that 
T-^U) n V ^ for alH > i . 



Under these three conditions a non degenerate stationary solution (Xt)te?>} to (|4.4p exists and 
has remarkable properties, see (|27l ) for a nice survey. For instance, the process (Xt)t£Z satisfies 
the conditions of Proposition 14.11 for some a,c > and 6=1, see (@). Moreover, the marginal 
distribution is absolutely regular and its distribution belongs to BV . Noticing that B\ 1 C BV C 
B\ ao , see e.g. Theorem 13.11 provides adaptive estimators of the marginal density of (X t )tez in 
that context and extends Theorem 2.2 of (0) where rates of the MISE for non adaptive estimators 
are given in such context. 



4.3. A-weak dependence. The stationary process {Xt)t&L is A-weakly dependent, as defined in 
(fbl ). if there exists a sequence of non-negative real numbers A(r) satisfying A(r) — > as r — * oo 
and such that: 

\Cov{h(X il ,...X i J ,k{X iu+1 ,...,X iu+v ))\ < 

(u || fc||ooLip(/i) + ■u||/i[| Q Lip(fc) + uvLip (/i)Lip (A;)) A(r) 

for all p-tuples, (ii, . . . , i p ) with ii < • • • < i u < i u + r < i M+ i < ■ ■ ■ < i p , and for all h £ A u and 
h G A„ where 

A„ = {/i : R u — > E, Lip (/i) < cxd, ||/i||oo < cxd} , for any u < 1. 
The A-weak dependence provides simple bounds of the covariance terms: 

Cov (s jtk (X h )- ■■5 j , k (X iu ),5 jik (X iu+1 ) ■ ■ ~5 hk {X iu+v )) | < (u + v + uv) ||^ iA; ||^- 2 (Lip^) 2 A(r). 
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The right hand side term is bounded by (2- J / 2 2||(5|| 00 ) u+1 '~ 2 Lip <5 2 2 3j A(r). At this point, we do 
not achieve Assumption (|D1|) as this bound depends on j via an extra term 2 3j . An additional 
assumption is needed to ensure Assumption (D): 

(J): The joint densities f r of (Xo,X r ) exist and are bounded for r > 0. 

Proposition 4.2. Assume that (Xt)t<^z is a X-weakly dependent process satisfying (J) and such 
that there exist a' ,b' ,c' > and a" , b" , c" > with: 

A(r) < c'exp(— a'r b ) and \\f r \\oo < c"exp(a"r fc ) for all r > 0. 

If b" < b' then (D) holds for some Cq > with = 211(51100, a = a'/4 and b = b' . 



The proof of this Proposition is given in Subsection 16.31 



4.4. Examples of A-weakly dependent processes. Firstly we give in this Section two generic 
A-weakly dependent models, Bernoulli shifts and random processes with infinite connections. 
Secondly we detail the conditions of Proposition 14.21 in three more specific models. 

Let H : R z — > [0, 1] be a measurable function. A Bernoulli shift with innovations £t is defined 

as 

x t = fr((&_i) i6 z), t€Z. 

According to (14), such Bernoulli shifts are A— weakly dependent with A(r) < 2v([r/2]). If (^)tez 
is iid, (v(r)) r> Q is a non-increasing sequence satisfying 

E \H j EZ)-H £ Z)| < u(r) for all r > 0, 

where the iid sequence is such that £J = £j for |j| < r and £J independent of £j otherwise. 

If the weak dependence coefficients Ag(r) refer to (£t)tgz, we can compute these of (X^gg. More 
precisely, the result in 

(0) 

states that if there exists £ > such that 
\H(x)-H(y)\ <b s (\\z\\ e Vl)\x s -y s \, 
for some sequence bj > satisfying \j\bj < oo and where x,y £ M z coincide except for the 

index s £ Z and ||x|| = sup iGZ |xj|, if E|£o| m < °o for some m' > £+ 1 then (X t )t£Z is A-weakly 
dependent with 

« , „ m'-l-l 

for some c > 0. 



A(fc) < c inf 

r<[k/2] 



\j\bj + (2r + ifX^k - 2r)^T+^ 

Different values of b in Assumption (|D2p may arise naturally when realizing the minimum of the 
equation above, see below for some classical examples. 



Another approach is the one of random processes with infinite connections considered in 
Let F : [0, x R — > [0,1] be measurable. Under suitable conditions on .F, the stationary 

solution of the equation 

X t = F((X j ,j?t),£ t ), a.s., 
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exists and is A-weakly dependent. We refer the reader to |l2|) for more details and we end the 
Section with some specific A-weakly dependent models. 



4.4.1. Infinite moving average. A Bernoulli shift is an infinite moving average process if 

X t = Y,<Xitt-i- (4.5) 

If are iid random variables satisfying E|£o| < 1 then (Xt)t^z 1S A-weakly dependent with 
A(r) < 4^i J -| > r r / 2 ] If aj < Ka'^ for j ^ 0, K > and < a < 1 then the condition on A in 
Proposition 14.21 holds with b' = 1. Then Assumption (jD2j) is ensured with 6 = 1. 



4.4.2. LARCH(oo) model. Let (£t)tez be an iid centered real valued sequence and a,dj,j G N* 
be real numbers. LARCH(oo) models are solutions of the recurrence equation 



Xt = tt\a + YajX^i . (4.6) 




The stationary solution of (|4.6p satisfies the condition on A in Proposition 14.21 with b' = 1/2 if 
there exists K, a > and a < 1 such that aj < ifo^l for all j ^ 0. Then Assumption (jD2j) is 
ensured with b = 1/2. See (jlll ) for applications of this model in econometrics. 



4.4.3. Affine model. Let us consider the stationary solution (Xt)teZ of the equation 

X t = M(X t -i,Xt-2, ■ ■ + f(X t -i,X t - 2 , ■ ■ ■), 

where M and / are both Lipschitz functions. This model contains various time series processes 
such as ARCH, GARCH, ARMA, ARMA-GARCH, etc. If the £ t are iid random variables with a 
bounded marginal density then ( J) holds and the joint densities are uniformly bounded, as stated 
in the Appendix of (jl3l ). Moreover if the functions M and / have exponentially decreasing Lips- 
chitz coefficients, then conditions of Proposition 14.21 hold with b' = 1/2, b" = and Assumption 
(|D2j) follows with 6 = 1/2. 



5. Cross-validation procedures and simulations 



The aim of this Section is to evaluate the applicability and the quality of the procedure on 
simulated data. Even if the estimators are adaptive with respect to the regularity of the density 
function, a constant appears in the threshold levels that we cannot calibrate if the dependence 
properties of the observations are unknown. Then we develop a cross-validation scheme in order to 
apply concretely the estimator on simulated data. We investigate several examples of dependence 
for the simulated observations that satisfy the convergence result of this paper. We also give 
counter-examples where the estimators fail to converge near-minimaxly. 
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5.1. Cross-validation procedures. According to Theorem 13. 1\ let us fix jo as the smallest 
integer larger than log(n)(l + iV) -1 and define fn TCV an d fn TCV respectively as the hard and 
soft-thresholding estimators associated with threshold levels {(Xj)j <j<j*). Here j* = log 2 n and 

((^jOjo <?<?*) are determined by cross-validation procedures: Xj = Argmin^ CVj(X) where cross 
validation criterion are respectively defined by the equations 



HTCV: CV j (X) = J2 1 



STCV: CV j (X)=J2 1 



—f TT Y] 4>j,k( X i)^j,k( X h 

n[n — 1 



i<ijkh<n 



j ' k n{n - 1) 



l<ijkh<n 



These criterion are obtained by approximating the coefficients f3j^ by j3j^ and the products 
Pj,k(3j,k> by Yjh^i^hk{ X i)^hk{ X h)/{ n { n - 1)) in the Integrated Square Error. 



The estimator j\ is defined as the smallest integer such that CVj(Xj) = for all j\ < j < j* . 
Notice that cross-validation procedures may consider larger resolution levels than the estimators 
f n as j* is larger than ji given in Theorem 13.11 



5.2. Different dependent samplings satisfying (D). To illustrate the behavior of this cross- 
validation scheme, we simulate three different weak-dependence cases with the same marginal 
absolutely continuous distribution F. The simulations were carried out as follows: 



Case 1: Independent observations are given by Xi = F l (Ui) where the Ui are simulations 
of independent variables, uniform on [0,1]. 

Case 2: A ^-weakly dependent process is obtained by the equation Xi = F~ 1 {G(Yi)) for 
i = 1,... ,n with G(x) = 2^J x(l — x)/tt and (Yi)i=X,...,n given by 

and, recursively, Yi = T l ~ 1 {Y\) for 2 < i < n with T{x) = 4x(l — x). 

Note that for all 1 < % < n the Yi admits the repartition function G the invariant distri- 
bution of T. Moreover the sequence (Y\, . . . , Y n ) satisfies the assumptions of Proposition 
I4T1 see (jH) for details. 

Case 3: A A-weakly dependent process resulting from the transform Xi = F^ 1 {G{Yi)) of 
variables (Y t )t&i which are solution of 

Y t = 2(y t _i + Y t+1 )/5 + 5^/21 
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where (£t)teZ is an iid sequence of Bernoulli variables with parameter 1/2. The stationary 
solution of this equation admits the representation 

where aj = 1/3(1/2)' J 'L This solution belongs to [0, 1] and its marginal distribution G is 
the one of (U +L r/ +£o)/3 where U and U' are independent variables following U([0, 1]). For 

1 < i < n, the solution Yi is approximated by Y^ for 1 < j < N and j—N <i< n+N—j 

where Y^ is generated according to the convergent algorithm given in (12): the initial 

values Y^ are fixed equal to and, given simulated variables (£i)-N<i< n +N, we define 

recursively Y^ j) = 2(Y^~ 1 ' ) +Y^ 1 ' ) )/5 + 5&/21 for 1 < i < N and i - N < t < n + N - i. 
Error of approximation are negligible as decreasing exponentially fast with the parameter 
./V that we fix N = n, see Lemma 6 of {HI) for more details. Moreover, Proposition 2.1 of 
(jl) ensures that there exists a, C > such that A(r) < C exp(— ar) for the process {Yt)teZ 
and consequently for (Xtjtez- 



Two different density functions are considered. The first one is a mixture of a sinus function 
and a uniform distribution, presenting a discontinuitie, and the second one is a mixture of two 
gaussian distributions. 



5.3. Comparison of /„ and /„ ■ The first density considered is a mixture between a 
sinus function and a uniform distribution. 

The calculations were carried out on MATLAB on a Unix environment. We considered n = 2 10 
observations repeated M = 500 times for each of the three weak dependence cases. Once the data 
simulated we applied cross-validation procedures. The usual DWT algorithm proposed by (0) 
and implemented in the Wavelab toolbox by Donoho and his collaborators (available on (|29l)) 
only gives values of wavelet functions on an equidistant grid. As one needs to compute these 
values at given data points, we consider an equidistant grid of / points with the number of points 
/ huge with respect to the number of observations n. Then we approximate the values if)j k(Xi) 
by ifrj k([Xil\/I) where [x] denotes the closest integer from any real number x. The wavelets used 
for the decomposition are Daubechies Symmlets with N = 8 zero-moments. Notice that another 
possible scheme is the algorithm of (0) that gives directly the values ipj ^Xi). But as it needs 
much more calculus time it has not been used here for convenience. 

In Figures [1] and [2] are represented the estimators fn TCV an d fn TCV an d the true density 
function / in different weakly dependent cases. The quality of the estimators is visually good. 
According to Figures [U and [2] the weak dependence properties of the simulated data do not seem 
to affect both procedures of estimation. Density estimators presented in Figures [U and [2] do not 
detect the discontinuity in the density. Actually, for any finite data set, not enough simulated 
values are concentrated around the discontinuity to allow estimators to detect it. 
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FIGURE 1. Examples of estimators fn TCV obtained on 2 10 observations. The true dis- 
tribution is represented in dashed lines. Figures from left to right correspond respectively 
to Case 1, Case 2 and Case 3. 




FIGURE 2. Examples of estimators fn TCV obtained on 2 10 observations. The true dis- 
tribution is represented in dashed lines. Figures from left to right correspond respectively 
to Case 1, Case 2 and Case 3. 

Table [T] gives approximations by Monte-Carlo method of the MISE. MISE values of the esti- 
mators have the same order whereas the weak dependent cases and fn CV is preferable in all the 
cases. 



MISE of the estimation 




Case 1 Case 2 


Case 3 


HTCV 


0.096696 0.077064 


0.097193 


STCV 


0.082934 0.06586 


0.097184 



TABLE 1. MISE approximated by MC on 500 simulations of samples of size n = 2 10 . 

In Figure [3] threshold levels are represented with respect to resolution levels. Their behaviors 
are similar in all cases: the threshold levels increase with respect to resolution levels. For small 
resolutions, both HTCV and STCV procedures are close as A| is negligible in CVj(X). For high 

resolution it is also the case as is big enough to kill almost all /3j t k- Moreover these figures 
tend to confirm that threshold levels do not depend on weak dependence. Finally remark that 
the curves do not behave in square root of j as theoretical threshold levels given in Theorem 13.11 

After looking at the threshold levels values, we give in Figure 0] the frequencies of the ftj^ 
that are less than Xj with respect to j. As these frequencies are not discretized in two values 
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FIGURE 3. Means of the proportions of the threshold levels obtained by cross-validation 
with respect to the resolution levels for hard-thresholding (left) and soft-thresholding 
(right). Case 1 corresponds to the solid line, Case 2 to the dashed line and Case 3 to 
the dotted line. 



and 1, we can infer that both ff? and /„ are not equivalent to linear estimators. It 
is encouraging for both methods as the results in (0) state that linear estimators are not near- 
minimax. One can also see in these figures that frequencies of effective thresholds are the same 
among the weak dependence cases. 




FIGURE 4. Means of the proportions of thresholded coefficients with respect to the reso- 
lution levels for hard-thresholding (left) and soft-thresholding (right). Case 1 corresponds 
to the solid line, Case 2 to the dashed line and Case 3 to the dotted line. 



Finally, means of higher resolution levels are given in last Table [5J According to Theorem 
13.11 the values of this parameter do depend on the cases of weak dependence. But no significant 
differences appear on simulations. 



Mean of j\ 




Case 1 Case 2 


Case 3 


HTCV 


5.168 5.14 


5.13 


STCV 


5.14 5.04 


5.13 



TABLE 2. Means of ji on 500 simulations of n = 2 10 observations. 
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5.4. Comparison with kernel estimators. As results computed in the last Subsection are 
systematically better for the STCV estimators than for the HTCV ones, we only present in the 
sequel results for the STCV estimators. Let us now consider the case of a density function that is 
a mixture of normal distributions. Like in , we compare the quality of the wavelet estimator 
jSTCV with li near kernel estimators. The kernel used is Epanechnikov's one and we computed 
two choices for its width parameter: firstly we consider the width given by the rule of thumb of 
Matlab, (more precisely, the width is equal to (q^ — q{)/ (2 * 0.6745) * (4/ (3 * n)) 1 / 5 , where qi and 
(/3 denote respectively the first and the third quartile of the empirical distribution) and secondly 
we consider the width obtained by cross validation on the mean integrated squared error risk. 

In Figure [5] are represented the means of wavelet and kernel estimators in the different cases. 
Like in the last Subsection, there is no visual difference between the different cases of dependence. 
In this Figure, it appears that the mean of the kernel estimators fails to detect the two modes 
of the density when the width is choosing according to the rule of thumb. The bandwidth is 
overestimated in this case. The quality of the wavelet estimators fn TCV an d the kernel estimators 
with the width from a cross validation procedure are visually equivalent. 




FIGURE 5. Means of estimators fn TCV obtained from 2 10 observations on 500 simula- 
tions. The mean of the wavelet estimators is represented in dashed lines while the means of 
the kernel estimators are represented respectively in line with dots for the rule of thumb's 
width (kernel estimator 1) and in dots for the cross-validation width (kernel estimator 2). 
Figures from left to right correspond respectively to Case 1, Case 2 and Case 3. 

To analyse more precisely approximations realized by fn TCV an d by kernel estimators, we 
represent in Figure [6] the evolution of the mean LP risk with respect to p for the three estimators 
in each case of dependence, i.e. ^(\\g — /||p) 1//p with g equals to one of the three estimators. 
Even if these risks are close to each others, kernel estimator with cross validation bandwidth has 
the smallest risk for small values p < 4. Yet, approximations of this kernel estimator clearly get 
worse with higher values of p, while risks of fn TCV seem relatively stable for different value of p. 
Concerning kernel estimator with the width parameter taken according to the rule of thumb, the 
LP risk is worse for small values of p but comparable with the one of wavelet estimator for higher 
values, even if the modes of the density are not detected. 



These graphs show that an advantage of fn TCV is that its mean \P risk seems stable for high 
values of p. Nevertheless, the mean of the L? risk is larger than the one of kernel estimators 
with cross-validation width and the computation time higher. One possible way to improve the 
quality of approximation for the cross validation procedure may be to consider different levels of 
thresholding at each resolution level. We do not investigate this axis of research as then the time 
of computation exploded. 
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FIGURE 6 . Evolution of the LP risk of estimators fn TCV and kernel estimators obtained 
on 2 10 observations. The wavelet estimator is represented in dashed lines while the kernel 
estimators are represented respectively in dots line for the rule of thumb width (kernel 
estimator 1) and in dots for the cross-validation width (kernel estimator 2). Figures from 
left to right correspond respectively to Case 1, Case 2 and Case 3. 



5.5. Different dependent samplings that do not satisfy (D). In this Subsection, we discuss 
the necessity of Assumption (D). For this, we study the convergence of the density estimators 
on some dynamical systems that do not satisfy this assumption. More precisely, we focus on 
Liverani-Saussol-Vaienti maps, see (flU ). defined as the solution of X t = T l (X t -i) with 

T(x) = 



\x(l + 2 a 'x a '), < x < 1/2 for some < a' < 1 

1235 - 1, 1/2 <X<1. 



The process (Xt)tez is stationary and such that the covariance terms Cov(/(Xo), g(X r )) are of 
order r 1-1 / ^ se e (30) and refinement in (flEI). Thus the Assumption (D) is not satisfied in this 
case and we have the non-minimaxity of any thresholded wavelets estimators: 

Proposition 5.1. Suppose that the father wavelet <j) is such that J <f> > and that the assumptions 
of Theorem \3.1\ are satisfied. If 1 > a' > l/(2a + 1) with a defined by (I2.ip . then for the 
thresholded estimators of the marginal density of the Liverani-Saussol- Vaienti map of index a 
there exists some C > such that: 

n 2a E[\\f n - /|||] > C, forn sufficiently large. 
The same result also holds for the cross validation thresholded estimator fn TCV ■ 



The proof of this Proposition is given in Section 

To simulate these dynamical systems, we simulate Zq according to the Lebesgue measure on 
[0, 1], then we apply recursively T to determine Zi and finally we set (X\, . . . , X n ) = (Z n+ i, . . . , Z2 n )- 
This approximation of the stationary solution does not affect the study of the convergence rates 
as the dynamical system Z is ergodic in mean with rate 0(n 1_1 / a '), see Theorem 5 of (j3(l ). 



The analytic expression of the density / is unknown but it is proved to be continuous, locally 
Lipschitz and to behave like x~ a ' as x — > 0, see ((la) and (j30l ). In particular, / is unbounded 
on [0, 1]. Then we restrict our study on [0.01, 1] where / is bounded. As the true density / is 
unknown, we compare here the estimators fn TCV with other estimators. In Figure El we plot the 
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mean of M = 100 estimators an d Epanechnikov kernel estimators given by Matlab's rule 

of thumb for 9 different values of a'. 






FIGURE 7. Means of the estimators /„ cv obtained on 2 10 observations and 500 simu- 
lations. Means of the kernel estimators are represented in dashed lines. 

Visually, means of both estimators are closed to each other. To detect some difference be- 
tween the estimators behavior, we decide to compute the moments of order k = 1, . . . , 20 of the 
estimators integrated on [0, 1]: 



o.oi 



E 



9 k (t) 



l/k 



dt 



where the random function g is alternatively fn TCV or kernel estimators. 



For small values a' < 0.02 and k < 4, the moments of both estimators have similar values. 
But as a' growths, all the moments of fn TCV explode more rapidly than the ones of kernel 
estimators as k increases. Previous simulations studies show no behavior difference for fn TCV 
between independent cases and dependent cases satisfying (D). In dependent cases that do not 
satisfy (D), the behavior of fn TCV depends on the decrease rates of covariance terms. On the 
contrary, the behavior of the kernel estimators with rule of thumb width is more stable when (D) 
is not satisfied. 



6. Proofs 



In this Section are collected all the proofs of this paper. 
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FIGURE 8. Moments of the estimators fn TCV obtained on 2 10 observations and 500 
simulations. In dashed lines are represented the moments of the kernel estimators. 



6.1. Proofs of Lemma 13.11 Firstly we give the proof of the inequality f|3. If) of Lemma I3.ll 
This proof is essentially based on Lemmas 16.11 and 16.21 The first one, Lemma I6.1( is simply the 
Theorem 2 of (|9|) that we recall here for completeness: 

Lemma 6.1. (Doukhan & Louhichi (1999)) Let {Zi)\<i< n be centeres variables. 
Let q be an even integer and n > 2. 

Suppose that for all p = 2, . . . ,q and for all 1 < si < s p < n satisfying max Si+\ — Si = — s u = 
r, there exists V pn such that: 

n-1 

nj> + lf- 2 Cov{Z sl ■ ■ ■ Z Su ,Z Su+1 ■ ■ ■ Z Sp ) < V p , n . 

r=0 

Then, we have 

ntz^<^§{v 2 fvV,n} , (6.1) 
1=1 ^ >' 



We refer the reader to (0) for the proof of this result. We apply this result on Z± = ifjj t ^(Xi) — 
0j k, when (D) holds. We first determine the bounds V p ^ n as under Assumption (D) we have: 

n— 1 n— 1 

J> + ir 2 Cov(Z Sl • • • Z Su , Z Su+1 ■ ■ ■ Z Sp ) < J2(r + iy- 2 p(r)p 2 (2i/ 2 M 2 y- 2 . 

r=0 r=0 

Here we need the following analytic Lemma to bound the quantity X^o( r + ^) p P( r )'- 
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Lemma 6.2. If (|D2p is satisfied, i.e. if p(r) < Coe~ arb , then for all integer p we have 

oo 

^(r + l)VO<^iC 2 P (p!) 1/fc , (6.2) 

r=0 

with some constants C\ and C 2 that are depending on a and b. 



Applying this result, whose proof is given at the end of this subsection, we directly obtain the 
new bound: 

n-l 

£(r + l) p - 2 Cov(Z Sl • • < C lP 2 (2^ 2 M 2 C2) p - 2 ((p - 2)!) 1 / 6 

r=0 

We set V p , n = C 1 p 2 (2 j / 2 M 2 C 2 ) p ~ 2 ((p - 2)!) 1 / 6 ra, and, applying Lemma[6J] we obtain that: 

E| J2(Zi - EZ W < {2 { lZll {^ n ) q ' 2 } V [Ci{yl 2 M 2 C 2 y~ 2 {{q - 2)!) 1 /"} . 
Dividing by n~ q and noticing that 2 J / 2 < n for < j < logn, we derive that: 

m,k - P h k)\ q < nq/2 J^V {^) q/2 } V {Ci(M 2 C 2 y- 2 ((q - 2)1)^} , 
which corresponds to the inequality (|3.ip . In particular, for p = 2 we have 



n 

E\^T(Zi -EZ )\ 2 < 4Cin. 
i=i 



Now the inequality (|3.2p of Lemma 13.11 is a direct application of Theorem 1 in (fiol ) with 

5 n = Er=i( z i - E ^o)> i = nA, 1/ = 0, /x = 1/6, A n = 4C in and 5 n = 2M 2 C7 2 2 (2+6)/b 2-?/ 2 . We 
refer the reader to (jlQ ) for the definition of the parameters t, ^, /x, A„ and -B n . 



Proof of Lemma \6.S\ . Define g(x) = (1 + x)e~ axh for all x < 0. Studying its derivative, we can 
easily see that it exists x a ^ such that the function g decreases on [x a) b, +00). If we denote k > 1 
the smallest integer greater than x a ^, we can infer from (|D2p that 

00 /fc-l \ 

J2(r + l) p p(0 < C J2(r + l) p p(r) + / (a + l) p exp(-ax b )<ix 

r=0 \r=0 / 

(poo > 
C a , fe + y (x + l) p exp(-ax 6 )cte 



With a convex inequality onxi-> x p , we achieve the bound 

00 / / />oo />00 \ \ 

^(r + l) p p(r) < C ^C a , 6 + 2P" 1 (J x p exp(-ax fe )dx + y exp(-ax b )dxj J . 
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Then, writing u = ax b , 

CO , 

Y,(r + l) P p(r)<C (c atb + 2P- 1 b- 

r=0 ^ 

with r defined by T(x) = u x ~ 1 exp(— u)du for all x > 0. Let \x~\ denotes the largest integer 
smaller than x > and note T = sup^g^i] Using the inequalities T((p + l)/b) < T\(p + 

l)/b] \ < Te p/b (p\) 1/b and the fact that this last bound is also available for T(l/b), we have: 

oo 
r=0 

< C (c a , b + 2- 1 b- 1 a- 1 / b T) (2e 1 /"( a -V'>vl)) P (p!) 1 ^. 
The Lemma comes immediately by choosing the appropriated constants. □ 



-(p+l)/6 r 



p+1 



a -(p+l)/b + a - 



1/6 



Te p/b (p\\ 



1/6 



6.2. Proof of Theorem 13.11 The proof of Theorem 13. II is very similar to the one in the iid case 
given in (0). In the sequel, C denotes a positive real number that does not depend on n, j nor k. 
Its value may vary from an equation to another. Let us fix 1 < p < oo and consider only the cases 
where 1 < it < p. The cases where ir > p follow from the case ir = p applying Jensen's inequality 
on the error term E||/ n — f\\ p . Theorem 13.11 provides the convergence rate for / belonging to the 
Besov Ball B% r {M\). In particular / 6 L 2 and it can be written as: 



2c,-l oo 2^-1 

k=0 j=jo k=0 



a jo,k4'jo,k + Pj,k^j,k ■ 



E nf D JQ f 
We decompose the estimators f n of / in the same way: 

k=0 j=jo k=0 
v ' » w ' 

3*0 / D JQ f 

where the 7^ denotes without distinction the soft and hard-threshold function. 

Thanks to Minkowski's inequality, the risk of f n is divided in two terms: 

/ 

E[\\E jo f - E jo f\\P] +n\\D j0)j J - D jo fr p ] 

h ' » . ' 



E[\\f n -fr P ]<2? 



-1 



To study the convergence rates of these terms the main tools are the following Lemmas given 
respectively in |2l|) and (0). Here for any p > 1 we denote || • \\i p the ^ p -norm defined by 
ll°ll? = Tli \ a i\ P f° r an y sequence of real number (aj)j>o. 
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Lemma 6.3. Let 5 denote with no distinction <j> and ip. For any 1 < p < oo, there exists c±,C2 > 
such that for all j > and all sequence (afc)o<fc<2J-i we have 



c^^-^llaE < 



23-1 



k=0 



Lemma 6.4. For any 1 < p < oo there exists a constant C > such that for all < j < j' 
and any triangular arrays (aj,fc)j-<j<j+,o<fc<2J-i we have 



j+ 21-1 

Y Y a i> k6 i> k 

3=3- k=0 



^2 2 j(5p / ( - p -y 

K \3=3~ 

Here (3 is an arbitrary real number. 



E^-^IKIIl, 

(p-2)/2 



if 1 < P < 2, 



3=3 



2^/ 2 - 1 -^/ 2 )||a J ||f p , ifp>2. 

3=3~ 



6.2.1. Bias of scale estimation T\. Lemma [6.3l for 5 = (ft and a& = 2j 0i fc — «j ,A: yields the existence 
of C > such that 



Ti = E 



2 J -1 



a jo,k)<ftjo,k 



k=0 



2n-l 



fc=0 



Thanks to Lemma 13. 11 the term T\ is bounded by 

Note that the choice of jo in Theorem 13.11 implies that the order of the bound is (2 j0 /n) p / 2 < 
(j n -pN/(2+2N) ne giigible compare with n~ ps ^ 1+2s "> thanks to the hypothesis N > 2s. Note that 



1 + 



a + 2sp7r(l + 2(s - l/vr)) 



(6.3) 



in order that if e > then a+ < if e < then a+ > a_ and if e = then a + = a_. We 
conclude that for all possible choices of e the term T\ is negligible. 



6.2.2. Details term T^. The proof is based on multiple applications of Lemma 16.41 with a,j & = 
7A- (Pj,k)~ Pj,k an d different j~ and j + . Studying the expectation of the loss, according to Lemma 



6.41 and from the linearity of the expectation, the key terms of the bounds are 

v-i 

nhxMk) - M\i = Y E i7A,fe) - m p - 

The following Lemma gives upper bounds for the terms E|7^. (Pj t k) ~ Pj,k\ p f° r an y ji ^ : 
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Lemma 6.5. Under the assumptions of Theorem \3.1l there exists a constant C > such that 
for all jo < j < ji and all < k < 2 3 — 1 we have 

nix^hk) - m p < °( Bi a B2 a B3 )' 

where Bl=\0 jik \P, B2=\0 jik \ 7r X p ~* andB3=X p . 

Proof. From the definition of the hard and soft-threshold functions 7^ . we have : 

|7a,&) - M p < ^ (fe - M p + ^ jM >x j} + \M p \r^\<^r (6 - 4) 

The idea is to introduce the difference \@j k ~ Pj.k\ an d bound it using Lemma [3,ll More precisely, 
we have 

1 {■>.., A.} ^ 1 {|^, fc -/3,, fc |>A J /2} + 1 {|/3 J , fc |>A J /2}, 

and the expectation of the first term of the bound in (|6.4p is bounded by: 

(eI^,* - jjk \ 2p ) 1/2 (n\Pj,k - Pj,k\ > A,/2)) 1/2 + IXjMlPj.k - Pj,k\ > Xj/2) 

+ (E|j9 if *-/3 if *P + |A,|*)l { | /Jiiik |>A, /2} . 

Inequality (|3.ip in Lemma 13.11 provides that W\f3j y k — Pj,k\ p — Cn~ p l 2 and consequently the 
expectation terms of the sum are smaller than |Aj| p . Applying f|3.2|) of Lemma 13.11 with A = Xj 
satisfying Xjy/n = Kj, we infer the existence of a constant c > such that 

(p(\d J ,k-M>^m) 1/2 <C2~ cKj 

from the inequality 

(2^ 2 /Vn~) b < (logn)- 26 - 3 < C 3 - 2h -\ 

Using l { ,^ fc |< Aj} < 1 {|^, fc - /3 ,, fc |>A J } + 1 {|/3 J , fc |<2A 3 }, the expectation of the second term of the 
bound in (|6.4p is lower than 

\m p (nfe - m > a,) + * m<h \<2x j} ) ■ 

As above the probability term is bounded by 2~ cK K 

Using all the previous bounds in inequality (|6.4|) leads to an upper bound for E|7,\ . (flj,k) — flj,k\ P 
with the expression B(p,j, k) + (A^ + \(3j±\ p )2~ cK ^ where 

B( P ,j,k) = x P i { \f) jik \>\ j} + \PjM Pl m, k \<^}- 

We investigate three ways for bounding B(p,j,k), using the fact that l{ a <6} < (b/a) a for all 
a, b > and all a > 0: 

Bl: B(p,j, k) < C\(3j t k\ p using the indicator in the first term of the sum, 
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B2: B(p,j,k) < C\Pj t k\ n ^ ^ with a = Xj, b = \(3j t k\ and a = tt in the first term and 
o = \Pj,k\j b = ^3 anc ^ a = p — tt in the second term of the sum, 

B3: B(p,j, k) < C\ P j using the indicator in the second term of the sum. 

Thanks to a suitable choice of K it is obvious that {\ p - + \(3j i k\ p )2~ cK3 is smaller than (Bl A B2 A 
B3). The result of Lemma [631 follows. □ 



The end of the proof of Theorem 13. II is based on successive uses of Bl, B2 and B3 depending 
on the resolution levels j. 



Let us first consider the highest multi resolution levels j > j+ where Bl is the most efficient 
bound given in Lemma 13. 11 The integer j + will be fixed later. From Minkowski's inequality we 
obtain the following bound for up to a constant: 



3+ 2J-1 



3=30 k=0 



2-7-1 



2^-1 



e EE - + E E E ^ (PiM* -EE M. 



3+<3<h k=0 



j+<j k=0 



T21 



T-22 



The term T99 is bounded applying Lemma IBTil with j = j+ + l, j + = 00 and Oj^ = JXj(Pj,k)~ Pj,k 
for j + < j < ji and < k < 2 3 — 1, a^k = —/3j,k for j > j\ and < k < 2 3 — 1. Using Lemma 
3Jwith Bl, we achieve that E[|o,[|? < C[|/5U|» for all j > j + and we have: 



T 22 <C< 



j>j+ 

\ (P-2J/2 

^2 2 jl3p/{p ' 2) j V 2J(p/ 2 ~ 1 -' 3 p/ 2 )| 



if 1 < p < 2, 



3X, ifp>2. 



j>3+ 



The Sobolev inclusion C Bp r C £>p'oo with s' = s — l/V + l/p leads us to choose (5 = —2s'. 
Noting that J2 j>j+ 2~ 32s 'p^ p ~ 2 ^ < C2-i+ s ' p2 l { v- 2 ^ we obtain the inequalities: 

00 

T 22 < C2~ 3 + S ' p E 2 j(s ' p+p/2 - 1) 11^-11^ < C||/||^ Pj00 2-^ s 'p. 

3=3+ 

We can choose j + as the largest integer such as 

2 ,i < « " 



logra 



We have to check that j+ < j% for n sufficiently large, i.e. s' > a. When e < we have 
q = s'/(l + 2(s — 1/vr)) and a < s' because s > 1/tt by hypothesis. When e > 0, equality (16. 3p 
implies that a+ < a_ < s' and then obviously s' > a for all the possible values of e. With this 
choice of j+, the rate of convergence of T22 is then the one stated in the Theorem. 
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We study the convergence of T 2 \ using the bounds B2 and B3 given in Lemma 16.51 Let us 
investigate further these two cases in order to compare their efficiency. On the one hand, using 
B2 and noting that ||/|| s ,7r,oo < ||/||s,7r,r < °°; we obtain the inequality: 

On the other hand, using B3 we have directly 

The rate of these two upper bounds are equivalent when A 2 = 2~^ 2s+1 \ Replacing Xj by its 
value, it follows that B2 is more efficient for j > j_ with j- the largest integer such that 



2 J " < 



n \ !+ 2s 



log n 



We check that j- < j + for all n and all possible values ofeasa = s/(l + 2s) > s'/(l + 2s) if 
e > and s'/a = 1 + 2s - 2/vr < 1 + 2s if e < 0. 

We decompose again T 2 \ using Minkowski's inequality. It gives the following upper bound, up 
to a constant: 

j- V-\ 2 ] -l 



3=30 k=0 



E E E Wj,k) - PjM* +n E E - 



3-<3<3+ fe=0 



T211 T212 

We control the term T211 using B3 according to the discussion above and applying Lemma [6? 

\'2 



with Xj < Clogn/n: 



T 2n < C(logn/n)f/ 2 < 



,11 



^2 2 jp/2 , 



if 1 < p < 2, 



3=30 
3- 



(p-2)/2 
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^ 2 ^p/(p-2) 2^^/\ if p > 2. 

jo / i=io 

Let us choose /3 = 1/2 in order to obtain the inequalities 

r 2U < C(logn/n)- p / 2 2 J '- p / 2 < C(log n/™) p(1 ~ 1/(2s+1))/2 < C{\ogn/n) pa if e > 0. 
This term is negligible if e < using (|6.3p . 

To conclude, it remains to bound T212 using Lemma 16.41 and B2. We use that A 2 < Clogn/n 

if 1 < p < 2, 



and we let appear the symbol e = sir — (p — ir)/2 

{ E ^ 

T 211 <C{\ogn/n)^l 2 



3-<3<3+ 



\ (p-2)/2 

£ 2ift>/(p-2)| ^ 2-X e +*W), ifp>2. 

k \j-<3<3+ } 3-<3<3+ 

From now we have to distinguish the cases where e 7^ to these where e = 0. 
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If e 7^ 0: Let us take (3 = —e/p. Then if e < 0, we obtain the inequalities: 

T 2U < C(logn/n)(P-^/ 2 2-J+ e < C(logn/n) ae / s ' + ^-^ 2 , 

and we conclude using the equality ae/s' + (p — tt)/2 = pa. If e > 0, then 

T 2U < C{logn/n) {p -^ /2 2- j - e < C(logn/n) p(1 - 1/(1+2s))/2 < C(log n/n) pa . 

If e = 0: Then p/2 = its + ir/2 and as sir > 1 we deduce that p > 2. Moreover we notice 
that (p — 7r)/2 = ap and thus 

2^/ 2 - 1 )E|| 7Aj (^.) -Pj\\ p ep < cx^^WPjWi < C \f p 2^ s+ *i 2 -v 

Let us denote tj = 2-?( 7rs + 7r / 2 - 1 ) Y?k=o Wj \\\ ■ Notice that from the definition of the Besov 
norms we have < C\\f\\ r s ,-K,r- From Lemma El with = we achieve 

T 2 n < Cilogn/n^jf- 1 £ t jt as p > 2. 

j-<j<j+ 

If r < 7r then Y2j_<j<j £j < C and the result of Theorem 13.11 follows. If ir < r we use 
Holder's inequality at the powers r/n and r/(r — tt): 

/ . \ 1— Tr/r 

E «j < C||/||^ r I £ ^^"^ ) < Cj^ ,r < C(logn) 1 -^. 



3=30 



6.3. Proofs of results given in Sections [5] and [4j In this Section are collected the proof of 
Proposition 14.11 Lemma 14.11 Proposition 14.21 and Propostion 15.11 Denoting with no distinction 
^j,k{x) ~~ Pj,k and (j)j t i~(x) — aj t k as 5j t k{x) for any j, k, we collect here some inequalities useful 
in' this Section: E\5^ k (X )\ < 2||/|U|(%2^'/2, E|^ fc (X )| 2 < ||/|U, H^IU < 2\\5\\ oa 2^ 2 , 
Lip5j,fe < Lip e)2 3jf / 2 and H^^IIbv < (Halloo + ^4Lip 5)2^ 2+1 for all j > 1. The last assertion comes 
from the fact that Sj t k is a bounded Lipschitz function supported by [{—A + k)2~ :) , (A + fc)2 -J ]. 



Proof of Proposition \4-l\ As for any j, k the function has bounded variations we have 
Cov \6 j>k (X sl ) ■ ■■S jtk (X s J,6 j>k (X Su+1 )- ■ ■ 6 jtk (X Su+v )j < vE 5 j)k (X Sl )- ■ -S jt k (X Su ) \\8j >k \\BvMr). 

Noticing that H^fcHoo < H^fcH Wj it follows 

Cg(r) < «[|4*[|^- 2 [|%,*||BvE|? i)fc (Xo)|^(r) 

< (n + „ + n? ;)(2^+i c (|| 5 || oo + ALip^r^-^H/IUII^I^-i/ 2 exp(-ar 6 ). 
Then Proposition 14.11 is proved. □ 

Proof of Lemma \4-l\ The proof is very close to the one given in (0) • First notice that (|4.3|) for 
r = implies that c > 1 = sup sgB ^ ||(?||.By- From f|4.2j) we infer that <fi(cr(Xo),X r ) < exp(ar~ fc ) 
applying Lemma 4 of (6) on (|4.2p . From the Markov property we get (f)(a({Xj,j < 0}),X r ) = 
(/)(cr(Xo),X r ). Now for any £ < 1, for all r < «i < • • • < %n consider any g^. E -BVi for 1 < j < £. 
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Let us prove that we can restrict ourselves to the duplets satisfying i\ < • • • < %g. On the 
one hand, if ij = if = r then we have \\E(g r (X r )g r (X r )\Xi 1 = -)\\bv — c\\dr\\BV from f|4.3|) . As 
from assumption ||<7 r ||oo — \\9t\\bv < 1 then H^II-BV — \\9r\\BV + HflV llsv — 2 and we achieve the 
bound \\K(g r (X r )g r (X r )\Xi 1 = -)I|bv < 2c. On the other hand, if ij < iji then, from the Markov 
property, we get the equation 

HE^.^.)^,^,)!^ = OIIbv = ||E(^.(^)E( % (X v )|^)|X n = 

We proceed in two steps. Firstly from (14, 3ft . we infer thet x t— > gi jt i.,(x) = E(gt ., (Xj ., ) l-Xy = x) 
has variations bounded by c. Notice also that Hfl^-,* ||oo — llffivlloo — 1 an d therefore we deduce 
the bound Wg^g^,^, \\ bv < ||<% WBV+hij,^ \\bv < 1+c Secondly, using (JO]) on g^ ti .,/ Hfl^., \\bv, 
we infer that IIE^X^,^.,)^ = -)\\ BV < c(l + c) < c + c 2 . 

As c > 1, this bound is larger than the one in the case ij = ij/. Now, by straightforward 
recurrences in the worst cases i\ < ■ ■ ■ < ig, we get that 

HEG^PQJ • • ■ g ie (X ie )\X H = -)||bv < c + • • • + c e < iS as c < 1. 

Then, denoting (/^...^(a?) = ^(9ii{Xh) • • " <7i^ (-^ii)|-^u = ^0 f° r an x ) we have almost surely the 
equation 

E( gil (X h ) ■ ■■g k (X ie )\X ) - E(g n (X h ) • • ■ g h {X h )) = E(g h _ Cl (X h )\X ) - E( ftli ..., Cf (Xh)) 
From the definition of coefficients (f) we get 

||E(< fe ,..., Q (X il )|X ) -E(<7<i,..,«*(*ii))IU < ec^(a({Xj,j < 0}),X r ). 

For all 1 < I < v this bound holds uniformly for all g^ S BV\, 1 < j ' < £, using the definition of 
4> v (r) we conclude that 4> v {r) < c"4>{o{{Xj,j < 0}),X r ) < c v exp(-ar 6 ). □ 



Proof of Proposition \4-2\ We use the direct bound given in the proof of Lemma 1 of (|23l ) under 
(J): 

Cov (s j>k (X h ) ■ ■■8 j>k (X iu ),d j>k (X iu+1 ) ■ ■■5 j , k (X iu+v j) | < 2 3 (2^2 2 ||5|| 0O )«+"- 2 7 (r) 

with 7 (r) = E\5j ;k (X )6j, k (X r )\V(E\6 j;k (X )\) 2 < (||/r||oo V2||/|| £XJ )||J|| 2 2^'. Noticing that 7 A (3 < 
7 i/4 /? 3/4 for 

any positive numbers 7 and (5, we combine the two bounds on the covariance terms 
and we infer that (|D1[) is satisfied with M2 = 2 1| ^|| and 

p(r) = 2 9 / 2 ||5||; /2 (Liptf) 1 /2(||/ r || 00 v \\f\U 3/A X(r)y\ 

Assumptions of Proposition 14.21 on respective decrease and increase rates of A(r) and ||/ r ||oo yield 
the existence of Co > such that p(r) < Coexp(— a'r h /4). □ 



Proof of Propostion \5. 1[ We give the proof for f n but it also holds for f^ TOV . We begin with 
recalling the result of Corollary 7.1 in 
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Lemma 6.6 (Gouezel, 2004). For any Lipschitz function 5\, bounded measurable function 62 
such that 61,62 = in a neighborhood ofO, then for any < a' < 1, there exists some constant 
C > such that 

Cov(6 1 (X ),6 2 (X r )) ~C j 5 l {x)dx j 6 2 {x)dx r l ~ l/a ' when r -> 00. (6.5) 
We use the decomposition of / E L 2 in the orthogonal basis of wavelets functions and we obtain 

E dl/n - /III) > E|| ( a io~ k ~ a 3o~k)<i>3o,k\\l 

k£Sj 

> ^2 E(%_ fe - ay _ fc ) 2 . 
keSj Q 

If we develop K(Sj _ k — a.j Q - k ) 2 using the covariance terms and denoting Sj >k (x) = 4>j, k {x) — &j,k 
for x £ [0.01, 1] and null elsewhere, it comes: 

^ n— 1 

E(% _ fe - a ]Q _ k ) 2 = ~E(6 jjk (Xi) 2 ) +2j2 1 ^Cov(6 j0tk (X ),6 j0>k (X r )). 

r=l 



We want to apply Lemma 16.61 with 61 = 62 = 6j 0jk . We check easily the assumptions of this 
Lemma because of the definition of 6j 0tk , resulting from the fact that we estimate the density on 
[0.01, 1]. Moreover J 4>j . k = 2~i°l 2 J </> with J (ft > from assumption and then the covariance 
terms Cov(6j 0jk (Xo), 6j 0tk (X r )) are equivalent to r 1-1 /"' for some C > as r goes to infinity. 



Let no be such that for all n > no, u r ^ n = ,n ^-Cov(6j ^ k (X ),6j 0jk (X r )) is nonnegative. For 
some m > m' > 2, we decompose the sum of covariance terms, for n sufficiently large, in four 
sums: 

n— 1 n [n/m] [n/m'] n-1 

^ — —Cav(5j 0t k{X ),6j 0: k{X r )) = ^« r ,n + ^ Ur ' n + Ur > n+ Ur ^ 

r=l r=l r=n r=[n/m] r=[n/m'] 

where [a] denotes the integer part of a. The first term goes to with rate n. Then, by definition 
of no, the second and the last terms are nonnegative. Concerning the sum Y^r—\n/m] Ur > n > the 
summands all larger than (m! — 2) / (m' n)Cov(6j 0jk (Xo) , 6j 0jk (X r )) that is equivalent to 

C2~ j0 n~ 1 r 1 ~ 1 ' a! as n — * 00. The minimax rate a is such that a < 1/2 and then by hypothesis 
a' > l/(2a + l) > 1/2. Consequently, when n goes to infinity, the sum Y^r=\n/m] u ' r < n 1S l ar S er than 
a partial sum equivalent to C2 -J0 n 1 ~ 1 / Q: ' for some C > as n — > 00. As we assume 2a > 1/a' — 1 
we obtain the existence of some C > such that 



n-l 

n 2a 2 jo ^2 — — Gov(6 j(hk (X ), 6 j0tk (X r )) > C, for n sufficiently large. 



r=l 
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Collecting these facts and using that \Sj \ is up to a constant equal to 2 J0 , we obtain that 

n 2 "E(||/ n — /|||) is larger than some positive constant and the result of Proposition [5J] is proved. 

□ 
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